Using Spatial Language in Multi-Modal Knowledge Capture
نویسنده
چکیده
The ability to understand and communicate spatial relationships is central to many human-level reasoning tasks. People often describe spatial relationships using prepositions (i.e., in, on, under). Being able to use and interpret spatial prepositions could help create interactive systems for many tasks, including knowledge capture. Here I describe my thesis work modeling the learning and use of spatial prepositions and applying this model to the task of knowledge capture. Problem Being Addressed Spatial relationships play an important role in many reasoning tasks such as navigation and solving physics/engineering problems. Because space is such an important component of so many tasks, humans have developed specialized language for describing spatial relationships (i.e. prepositions such as in and on). Ideally, intelligent systems would be able to understand and use spatial language in their interactions with human users, particularly when doing visual-spatial tasks. Unfortunately, many systems modeling the use of spatial prepositions have had serious limitations. Many systems (e.g. Regier, 1995; Gap, 1995) operate only on geometric shapes, not real-world objects. While interesting, psychology has shown repeatedly (e.g. Herskovits, 1985; Coventry & Garrod, 2004) to the functional features of objects also play a prominent role in the use of spatial prepositions. Therefore systems that operate only on geometry are not easily extended to realworld tasks. When systems do address real-world objects, they typically rely on hand-annotated databases of objects restricting their scalability (e.g. Regier & Carlson, 2001). My thesis work seeks to create a domain-independent, scalable computational model of spatial preposition use. The utility of the model will be shown by using spatial language to resolve ambiguities in cooperative, multimodal (sketch + text) knowledge capture tasks. Proposed Plan for Research My research plan can be divided into two stages, a model of the formation of spatial categories and a system for knowledge capture in a multi-modal environment. This model is being built using a series of sketches depicting real-world objects in simple spatial relationships (in, on, above, below, and left). All sketches are being created using sKEA (Forbus, Ferguson, & Usher, 2001) the first open-domain sketching system. sKEA operates free of domain restrictions because it does not attempt to do recognition via default, relying instead on an interface mechanism that enables users to label pieces of ink with concepts drawn from a knowledge base. (The KB is a subset of ResearchCyc, containing over 35,000 concepts.) sKEA is ideal for investigating models of spatial language because it allows us to access both perceptual information (computed from the ink in a sketch) and also functional information (based on the link to the underlying KB). The sketches will be run through SEQL (Kuehne et al, 2000), a relational generalization algorithm. Given a set of cases represented as sets of predicate calculus facts, SEQL divides them into generalizations (categories) based on similarity using structure mapping (Gentner, 1983). The output of SEQL is a set of generalizations, each consisting of a list of facts. Each fact has an associated probability based on the number of cases it appears in (Halstead & Forbus, 2005). By altering the pre-processing of the sketch cases before generalization, I plan to show that sketches can be automatically categorized based on the spatial relationship depicted. Further, by examining the facts associated with each category/generalization I can determine which facts were key to the formation of the generalizations. These sets of facts, along with their associated probabilities, will then be converted into evidential rules for a SpaceCase, my Bayesian model which can label the spatial relationships in novel sketches with the correct preposition. This process describes how I will bootstrap a system that learns to create and use representations of spatial prepositions to correctly label relationships in sketches. Once this system is sufficiently accurate, i.e., able to correctly label a library of new (not seen in training) sketches based on stimuli from psychological experiments, I plan to demonstrate its usefulness as a disambiguation tool during knowledge capture tasks. For the knowledge capture component I am building a dialogue system into the Companions System architecture (Forbus & Hinrichs, 2006). This system will help the user and the Companions cooperatively execute multi-modal knowledge capture tasks. Multimodal knowledge capture is important because many textbooks are a mixture of diagrams and text that must be examined in tandem to understand the concepts being communicated. Consider Copyright © 2007, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
منابع مشابه
A Semantical Analysis of Second-Order Propositional Modal Logic
This paper is aimed as a contribution to the use of formal modal languages in Artificial Intelligence. We introduce a multi-modal version of Second-order Propositional Modal Logic (SOPML), an extension of modal logic with propositional quantification, and illustrate its usefulness as a specification language for knowledge representation as well as temporal and spatial reasoning. Then, we define...
متن کاملDeveloping a BIM-based Spatial Ontology for Semantic Querying of 3D Property Information
With the growing dominance of complex and multi-level urban structures, current cadastral systems, which are often developed based on 2D representations, are not capable of providing unambiguous spatial information about urban properties. Therefore, the concept of 3D cadastre is proposed to support 3D digital representation of land and properties and facilitate the communication of legal owners...
متن کاملMulti-modal Tracking for Object based SLAM
We present an on-line 3D visual object tracking framework for monocular cameras by incorporating spatial knowledge and uncertainty from semantic mapping along with high frequency measurements from visual odometry. Using a combination of vision and odometry that are tightly integrated we can increase the overall performance of object based tracking for semantic mapping. We present a framework fo...
متن کاملTowards automation in using multi-modal language resources: compatibility and interoperability for multi-modal features in Kachako
Use of language resources including annotated corpora and tools is not easy for users, as it requires expert knowledge to determine which resources are compatible and interoperable. Sometimes it requires programming skill in addition to the expert knowledge to make the resources compatible and interoperable when the resources are not created so. If a platform system could provide automation fea...
متن کاملKabbalah Logic and Semantic Foundations for a Postmodern Fuzzy Set and Fuzzy Logic Theory
Despite half a century of fuzzy sets and fuzzy logic progress, as fuzzy sets address complex and uncertain information through the lens of human knowledge and subjectivity, more progress is needed in the semantics of fuzzy sets and in exploring the multi-modal aspect of fuzzy logic due to the different cognitive, emotional and behavioral angles of assessing truth. We lay here the foundations of...
متن کامل